map element
Supplementary Materials Online Map Vectorization for Autonomous Driving: A Rasterization Perspective
The base model takes surround-view images of the ego-vehicle as input. As shown in Figure 1, we provide further visual comparisons of HD map vectorization results. The results reaffirm the necessity of a rasterization perspective in map vectorization. Figure 1 presents more visualization of MapVR's HD map construction results. As discussed in Section 3, the Chamfer-distance-based metric struggles to offer a fair evaluation for such scenarios.
- Transportation > Ground > Road (0.42)
- Information Technology > Robotics & Automation (0.42)
- Automobiles & Trucks (0.42)
- Transportation > Ground > Road (0.53)
- Information Technology > Robotics & Automation (0.43)
- Automobiles & Trucks (0.43)
Unveiling the Hidden: Online Vectorized HD Map Construction with Clip-Level Token Interaction and Propagation
Predicting and constructing road geometric information (e.g., lane lines, road markers) is a crucial task for safe autonomous driving, while such static map elements can be repeatedly occluded by various dynamic objects on the road. Recent studies have shown significantly improved vectorized high-definition (HD) map construction performance, but there has been insufficient investigation of temporal information across adjacent input frames (i.e., clips), which may lead to inconsistent and suboptimal prediction results. To tackle this, we introduce a novel paradigm of clip-level vectorized HD map construction, MapUnveiler, which explicitly unveils the occluded map elements within a clip input by relating dense image representations with efficient clip tokens.
Toward Efficient and Robust Behavior Models for Multi-Agent Driving Simulation
Konstantinidis, Fabian, Sackmann, Moritz, Hofmann, Ulrich, Stiller, Christoph
Scalable multi-agent driving simulation requires behavior models that are both realistic and computationally efficient. We address this by optimizing the behavior model that controls individual traffic participants. To improve efficiency, we adopt an instance-centric scene representation, where each traffic participant and map element is modeled in its own local coordinate frame. This design enables efficient, viewpoint-invariant scene encoding and allows static map tokens to be reused across simulation steps. To model interactions, we employ a query-centric symmetric context encoder with relative positional encodings between local frames. We use Adversarial Inverse Reinforcement Learning to learn the behavior model and propose an adaptive reward transformation that automatically balances robustness and realism during training. Experiments demonstrate that our approach scales efficiently with the number of tokens, significantly reducing training and inference times, while outperforming several agent-centric baselines in terms of positional accuracy and robustness.
- Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.04)
- North America > United States (0.04)
- Asia > Middle East > Republic of Türkiye > Karaman Province > Karaman (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)
FRIEDA: Benchmarking Multi-Step Cartographic Reasoning in Vision-Language Models
Pyo, Jiyoon, Jiao, Yuankun, Jung, Dongwon, Li, Zekun, Jang, Leeje, Kirsanova, Sofia, Kim, Jina, Lin, Yijun, Liu, Qin, Xie, Junyi, Askari, Hadi, Xu, Nan, Chen, Muhao, Chiang, Yao-Yi
Cartographic reasoning is the skill of interpreting geographic relationships by aligning legends, map scales, compass directions, map texts, and geometries across one or more map images. Although essential as a concrete cognitive capability and for critical tasks such as disaster response and urban planning, it remains largely unevaluated. Building on progress in chart and infographic understanding, recent large vision language model studies on map visual question-answering often treat maps as a special case of charts. In contrast, map VQA demands comprehension of layered symbology (e.g., symbols, geometries, and text labels) as well as spatial relations tied to orientation and distance that often span multiple maps and are not captured by chart-style evaluations. To address this gap, we introduce FRIEDA, a benchmark for testing complex open-ended cartographic reasoning in LVLMs. FRIEDA sources real map images from documents and reports in various domains and geographical areas. Following classifications in Geographic Information System (GIS) literature, FRIEDA targets all three categories of spatial relations: topological (border, equal, intersect, within), metric (distance), and directional (orientation). All questions require multi-step inference, and many require cross-map grounding and reasoning. We evaluate eleven state-of-the-art LVLMs under two settings: (1) the direct setting, where we provide the maps relevant to the question, and (2) the contextual setting, where the model may have to identify the maps relevant to the question before reasoning. Even the strongest models, Gemini-2.5-Pro and GPT-5-Think, achieve only 38.20% and 37.20% accuracy, respectively, far below human performance of 84.87%. These results reveal a persistent gap in multi-step cartographic reasoning, positioning FRIEDA as a rigorous benchmark to drive progress on spatial intelligence in LVLMs.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- Africa > South Africa > Western Cape > Cape Town (0.04)
- North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)
- (38 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology (0.88)
- Transportation > Ground > Road (0.66)
SDTagNet: Leveraging Text-Annotated Navigation Maps for Online HD Map Construction
Immel, Fabian, Pauls, Jan-Hendrik, Fehler, Richard, Bieder, Frank, Merkert, Jonas, Stiller, Christoph
Autonomous vehicles rely on detailed and accurate environmental information to operate safely. High definition (HD) maps offer a promising solution, but their high maintenance cost poses a significant barrier to scalable deployment. This challenge is addressed by online HD map construction methods, which generate local HD maps from live sensor data. However, these methods are inherently limited by the short perception range of onboard sensors. To overcome this limitation and improve general performance, recent approaches have explored the use of standard definition (SD) maps as prior, which are significantly easier to maintain. We propose SDTagNet, the first online HD map construction method that fully utilizes the information of widely available SD maps, like OpenStreetMap, to enhance far range detection accuracy. Our approach introduces two key innovations. First, in contrast to previous work, we incorporate not only polyline SD map data with manually selected classes, but additional semantic information in the form of textual annotations. In this way, we enrich SD vector map tokens with NLP-derived features, eliminating the dependency on predefined specifications or exhaustive class taxonomies. Second, we introduce a point-level SD map encoder together with orthogonal element identifiers to uniformly integrate all types of map elements. Experiments on Argoverse 2 and nuScenes show that this boosts map perception performance by up to +5.9 mAP (+45%) w.r.t. map construction without priors and up to +3.2 mAP (+20%) w.r.t. previous approaches that already use SD map priors. Code is available at https://github.com/immel-f/SDTagNet
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > Switzerland (0.04)
- Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.04)
- Transportation > Ground > Road (0.93)
- Transportation > Infrastructure & Services (0.93)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.48)
- Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.34)
- Information Technology (0.88)
- Transportation > Ground > Road (0.66)
Supplementary Materials Online Map Vectorization for Autonomous Driving: A Rasterization Perspective
The base model takes surround-view images of the ego-vehicle as input. As shown in Figure 1, we provide further visual comparisons of HD map vectorization results. The results reaffirm the necessity of a rasterization perspective in map vectorization. Figure 1 presents more visualization of MapVR's HD map construction results. As discussed in Section 3, the Chamfer-distance-based metric struggles to offer a fair evaluation for such scenarios.
- Transportation > Ground > Road (0.42)
- Information Technology > Robotics & Automation (0.42)
- Automobiles & Trucks (0.42)
- North America > United States > Oregon > Deschutes County > Bend (0.04)
- Asia > Singapore (0.04)
- Information Technology > Artificial Intelligence > Vision (0.94)
- Information Technology > Sensing and Signal Processing (0.68)
- Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.49)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)